skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Sun, Yiwei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
  2. Abstract Pure bacterial cultures remain essential for detailed experimental and mechanistic studies in microbiome research, and traditional methods to isolate individual bacteria from complex microbial ecosystems are labor-intensive, difficult-to-scale and lack phenotype–genotype integration. Here we describe an open-source high-throughput robotic strain isolation platform for the rapid generation of isolates on demand. We develop a machine learning approach that leverages colony morphology and genomic data to maximize the diversity of microbes isolated and enable targeted picking of specific genera. Application of this platform on fecal samples from 20 humans yields personalized gut microbiome biobanks totaling 26,997 isolates that represented >80% of all abundant taxa. Spatial analysis on >100,000 visually captured colonies reveals cogrowth patterns betweenRuminococcaceae,Bacteroidaceae,CoriobacteriaceaeandBifidobacteriaceaefamilies that suggest important microbial interactions. Comparative analysis of 1,197 high-quality genomes from these biobanks shows interesting intra- and interpersonal strain evolution, selection and horizontal gene transfer. This culturomics framework should empower new research efforts to systematize the collection and quantitative analysis of imaging-based phenotypes with high-resolution genomics data for many emerging microbiome studies. 
    more » « less
  3. null (Ed.)
  4. Nowadays, Internet is a primary source of attaining health in-formation. Massive fake health news which is spreading overthe Internet, has become a severe threat to public health. Nu-merous studies and research works have been done in fakenews detection domain, however, few of them are designedto cope with the challenges in health news. For instance, thedevelopment of explainable is required for fake health newsdetection. To mitigate these problems, we construct a com-prehensive repository, FakeHealth, which includes news con-tents with rich features, news reviews with detailed expla-nations, social engagements and a user-user social network.Moreover, exploratory analyses are conducted to understandthe characteristics of the datasets, analyze useful patterns andvalidate the quality of the datasets for health fake news detec-tion. We also discuss the novel and potential future researchdirections for the health fake news detection. 
    more » « less
  5. null (Ed.)
    We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data often exhibit longitudinal correlation (LC) (correlations among observations for each individual over time), cluster correlation (CC) (correlations among individuals that have similar characteristics), or both. These correlations are often accounted for using mixed effects models that include fixed effects and random effects, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factorization Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive models from longitudinal data. We establish the convergence properties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM outperforms the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at https://github.com/junjieliang672/LMLFM. 
    more » « less
  6. null (Ed.)
  7. null (Ed.)
  8. We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data of- ten exhibit longitudinal correlation (LC) (correlations among observations for each individual over time), cluster correlation (CC) (correlations among individuals that have similar char- acteristics), or both. These correlations are often accounted for using mixed effects models that include fixed effects and random effects, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to se- lect the most predictive fixed effects and random effects from a large number of variables, while accounting for complex cor- relation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factoriza- tion Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive mod- els from longitudinal data. We establish the convergence prop- erties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM out- performs the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at https://github.com/junjieliang672/LMLFM. 
    more » « less